D-Score: Holistic Dialogue Evaluation Without Reference
نویسندگان
چکیده
In artistic gymnastics, difficulty score or D-score is used for judging performance. Starting from zero, an athlete earns points different aspects such as composition requirement, difficulty, and connection between moves. The final a of the quality various performance indicators. Similarly, when evaluating dialogue responses, human judges generally follow number criteria, among which language fluency, context coherence, logical consistency, semantic appropriateness are on top agenda. this paper, we propose automatic evaluation framework called that resembles way gymnastics evaluated. Following four criteria above, devise range tasks model them under multi-task learning framework. proposed framework, without relying any human-written reference, learns to appreciate overall human-human conversations through representation shared by all over-fitting individual task domain. We evaluate performing comprehensive correlation analyses with judgement three datasets, two past DSTC series, benchmark against state-of-the-art baselines. not only outperforms best baseline large margin in terms system-level Spearman but also represents important step towards explainable scoring.
منابع مشابه
The Back-translation Score: Automatic MT Evaluation at the Sentence Level without Reference Translations
Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determini...
متن کاملSemantics, Dialogue, and Reference Resolution
Most pronoun resolution research has focused on written corpora while using syntactical and surface cues. Though big gains have been made in this domain with those methods, it is difficult to do better than the 80% coverage in these domains without some world or semantic knowledge. We investigate this issue by incorporating rich semantic information into a proven reference resolution model over...
متن کاملSentence-level MT evaluation without reference translations: Beyond language modeling
In this paper we investigate the possibility of evaluating MT quality and fluency at the sentence level in the absence of reference translations. We measure the correlation between automatically-generated scores and human judgments, and we evaluate the performance of our system when used as a classifier for identifying highly dysfluent and illformed sentences. We show that we can substantially ...
متن کاملA Matlab-Based Tool for Video Quality Evaluation without Reference
This paper deals with the design of a Matlab based tool for measuring video quality with no use of a reference sequence. The main goals are described and the tool and its features are shown. The paper begins with a description of the existing pixel-based no-reference quality metrics. Then, a novel algorithm for simple PSNR estimation of H.264/AVC coded videos is presented as an alternative. The...
متن کاملXMEANT: Better semantic MT evaluation without reference translations
We introduce XMEANT—a new cross-lingual version of the semantic frame based MT evaluation metric MEANT—which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references. Previous work established that MEANT reflects translation adequacy with state-of-the-art accuracy, and optimizing MT systems against MEANT robustly...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3074012